NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning

Yang, Tong; Cen, Shicong; Wei, Yuting; Chen, Yuxin; Chi, Yuejie (December 2024, 38th Conference on Neural Information Processing Systems)

Full Text Available
Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization

Cen, Shicong; Wei, Yuting; Chi, Yuejie (January 2024, Journal of machine learning research)

Full Text Available
Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence

https://doi.org/10.1137/21M1456789

Zhan, Wenhao; Cen, Shicong; Huang, Baihe; Chen, Yuxin; Lee, Jason D.; Chi, Yuejie (June 2023, SIAM Journal on Optimization)

Full Text Available
Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization

https://doi.org/10.1109/CDC51059.2022.9993175

Cen, Shicong; Chen, Fan; Chi, Yuejie (January 2022, IEEE Conference on Decision and Control (CDC))

Full Text Available
Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization

https://doi.org/10.1287/opre.2021.2151

Cen, Shicong; Cheng, Chen; Chen, Yuxin; Wei, Yuting; Chi, Yuejie (January 2022, Operations Research)

Natural policy gradient (NPG) methods are among the most widely used policy optimization algorithms in contemporary reinforcement learning. This class of methods is often applied in conjunction with entropy regularization—an algorithmic scheme that encourages exploration—and is closely related to soft policy iteration and trust region policy optimization. Despite the empirical success, the theoretical underpinnings for NPG methods remain limited even for the tabular setting. This paper develops nonasymptotic convergence guarantees for entropy-regularized NPG methods under softmax parameterization, focusing on discounted Markov decision processes (MDPs). Assuming access to exact policy evaluation, we demonstrate that the algorithm converges linearly—even quadratically, once it enters a local region around the optimal policy—when computing optimal value functions of the regularized MDP. Moreover, the algorithm is provably stable vis-à-vis inexactness of policy evaluation. Our convergence results accommodate a wide range of learning rates and shed light upon the role of entropy regularization in enabling fast convergence.
more » « less
Full Text Available
Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization

Cen, Shicong; Wei, Yuting; Chi, Yuejie (January 2021, Advances in Neural Information Processing Systems 34)

Full Text Available
Communication-Efficient Distributed Optimization in Networks with Gradient Tracking and Variance Reduction

Li, Boyue; Cen, Shicong; Chen, Yuxin; Chi, Yuejie (January 2020, Journal of machine learning research)
null (Ed.)
Full Text Available
Communication-Efficient Distributed Optimization in Networks with Gradient Tracking and Variance Reduction

Li, Boyue; Cen, Shicong; Chen, Yuxin; Chi, Yuejie (January 2020, Journal of machine learning research)

Full Text Available

Search for: All records